Linear-scaling time-dependent density-functional theory in the linear response 

formalism 
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We present an implementation of time-dependent density-functional theory (TDDFT) in the lin- 
ear response formalism enabling the calculation of low energy optical absorption spectra for large 
molecules and nanostructures. The method avoids any explicit reference to canonical representations 
of either occupied or virtual Kohn-Sham states and thus achieves linear-scaling computational effort 
with system size. In contrast to conventional localised orbital formulations, where a single set of 
localised functions is used to span the occupied and unoccupied state manifold, we make use of two 
sets of in situ optimised localised orbitals, one for the occupied and one for the unoccupied space. 
This double representation approach avoids known problems of spanning the space of unoccupied 
Kohn-Sham states with a minimal set of localised orbitals optimised for the occupied space, while 
the in situ optimisation procedure allows for efficient calculations with a minimal number of func- 
tions. The method is applied to a number of medium sized organic molecules and a good agreement 
with traditional TDDFT methods is observed. Furthermore, linear scaling of computational cost 
with system size is demonstrated on a system of carbon nanotubes. 
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I. INTRODUCTION 

In recent years, there has been increasing interest in 
the optical properties of nanomaterials. Nanostructured 
materials have potential applications in photovoltaics 
and photoelectrochemical cells [IHj] as well as uses as op- 
tical sensors [HJ . Quantum confinement and surface effects 
play a crucial role in the electronic properties of these 
materials @, while their large number of atoms makes 
them much more challenging to treat with conventional 
electronic structure methods than their bulk counter- 
parts. It is therefore vital to develop efficient ways of 
computing optical properties of large scale systems to 
high accuracy. 

Timc-dcpcndcnt (TD) density-functional theory 
(DFT)0 has become a very successful method in 
recent years in determining excitation energies and 
optical spectra of molecules and nanoclusters pj-fiol]. 
with excitation energies typically being predicted to 

TDDFT is appealing 



within a few tenths of an eV 11 



for large scale applications since it is computationally 
cheaper than more complicated many body techniques 
like the GW approximation and the Bethe Salpeter 
equation [10]. Continuous improvement in TDDFT 
algorithms over recent years |12j means that calculations 
on hundreds of atoms are now computationally feasible. 
However, even though TDDFT is computationally 
cheaper than more advanced methods, it still exhibits a 
cubic scaling behaviour with system size in conventional 
implementations, putting a severe limitation on the 
system sizes that can be studied. In ground-state cal- 
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dilations with density- functional theory (DFT) 13 , [TiJ , 
the development of linear-scaling methods [TEL 16 1 has 
been specifically aimed at enabling the treatment of 
large scale systems with up to hundreds of thousands of 
atoms [13 • Linear-scaling DFT calculations have been 
performed on large biomolecules and nanoparticIes(18j. 
Thus ideally, one would like to extend the ideas of linear 
scaling which have proved to be so successful in ground 
state DFT to excited state calculations in TDDFT. 

Fully linear-scaling formulations of TDDFT have been 
known for almost a decade [l{|. However, these formu- 
lations rely on propagating the TD Kohn-Sham equa- 
tions explicitly in time. The time-dependent response 
of the system to an external field can be determined at 
any instance, which, after a Fourier transform into the 
frequency domain, contains information about the fre- 
quenc y d ependent-response and thus the optical spec- 
trum [10]. To ensure stability of the solution, the time 
step to integrate the TD Kohn-Sham equations is cho- 
sen to be quite small (typically of the order of 10~ 3 fs) 
and thus the number of time steps required to obtain 
a meaningful spectrum becomes prohibitively large for 
many practical applications [12]. Furthermore, in time 
domain TDDFT implementations, one loses any explicit 
information on individual excitations, as well as the abil- 
ity to compute dipole-forbidden states. Only the spec- 
trum as a whole can be resolved [19j. 

For many of the applications mentioned above, one is 
mainly interested in the low energy optical response of 
the system, with energies in the region of visible and 
low energy ultraviolet light. Additionally, properties of 
individual excitations such as oscillator strengths and re- 
sponse density distributions are important for analysing 
the spectrum and optimising spectral response for spe- 
cific applications. A method which lends itself naturally 
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to computing low energy excitations of a system is the 
linear response formalism in which the TDDFT 

equations are cast into an effective eigenvalue equation 
that can be solved for its lowest eigenvalues [H, [HI ■ 
In principle, this formalism can also be made linear scal- 
ing [22j , making it ideal for the large scale nanostructured 
systems we have in mind. 

In this paper, we present a fully linear-scaling imple- 
mentation of TDDFT in the linear response formalism, 
suitable for calculating the low energy excitation ener- 
gies and spectrum of large systems. We will first give a 
brief overview of both linear-scaling DFT in the ONETEP 
code [3 (SectionHLgJ and linear response TDDFT (Sec- 
tion [HE])) mentioning only features that are important 
for our formalism. We will then present an outline of 
various aspects of the linear-scaling TDDFT formalism, 
making use of a double representation approach to rep- 
resent the occupied and unoccupied Kohn-Sham space 
f Sections III CHII E| . We will present results on a number 
of test systems (Sections IIII AllHl CP . as well as a demon- 
stration of the linear scaling of the computational effort 
with system size (Sections IIII Dl and IIII Ep . Our conclu- 
sions are summarised in Section IIVI 

II. METHODOLOGY 

A. Linear-scaling density functional theory in 
DNETEP 

All linear-scaling DFT formalisms are developed 
around the idea of exploiting nearsightedness [23]: This 
principle states that for any system with a band gap, the 
single particle density matrix decays exponentially with 
distance 0, HEf- A variety of different linear scaling 
methods based on this principle have been developed in 
recent years and have been reviewed extensively (la . [l6| . 

In ONETEP the density matrix is expressed through a 
set of optimisable localised functions {4> a } referred to as 
nonorthogonal generalised Wannier functions (NGWFs) 
[2^ 1 . The NGWFs are expanded in an underlying basis 
of periodic sine functions (psincs) [27| equivalent to a set 
of plane waves. The density matrix is then written in 
separable form (28j 

occ 

p(r,r') = ^^ KS (r)^ KS *(r') = «k(r)p{"><^(r') (1) 

V 

where we assume an implicit summation over repeated 
Greek indices. {pM«/3} 

are the elements of the valence 
density matrix in the representation of duals of NGWFs. 
Locality is imposed through a spatial cutoff on the den- 
sity matrix and a strict localisation of the NGWFs. The 
total energy of the system is minimised both with respect 
to the density matrix and the NGWFs. The underlying 
psinc basis of the NGWFs allows the method to achieve 
an accuracy equivalent to plane- wave methods (29| . The 
in situ optimisation of the NGWFs during the calculation 



ensures that only a minimal number of {4> a } are needed 
to span the occupied subspace. 

In a ONETEP calculation, there is no reference to in- 
dividual Kohn-Sham eigenstates in their canonical rep- 
resentation. Eigcnstates can be obtained in a post- 
processing step by a single diagonalisation of the DFT 
Hamiltonian in NGWF representation. Due to the mini- 
mal size of the set of NGWFs needed to represent the oc- 
cupied subspace, this diagonalisation is generally cheap, 
but does not scale linearly with system size. Occupied 
states are accurately represented by {4>a}, however, un- 
occupied states are reproduced increasingly poorly with 
increasing energy [3(|. In general, the specific optimi- 
sation of {0 Q } in order to represent the occupied space 
leads to poor representation of the conduction space man- 
ifold. 

This shortcoming was addressed recently [3fJ| in a 
method where a second set of NGWFs {xp} is optimised 
in a non-self-consistent calculation following the determi- 
nation of the ground-state. The method uses a Hamilto- 
nian that projects out the occupied states and minimises 
the energy with respect to a second conduction density 
matrix and the set of NGWFs {xp} m order to rep- 
resent the low energy subspace of the conduction man- 
ifold. The conduction density matrix is then expressed 
using the conduction NGWFs: 

p< c >(r,r / ) = J2€ S (r)€ S *(r') = Xa(r)P {c}a ^X*p(^ 

C 

Here, we use the subscript c to denote conduction Kohn- 
Sham states and N c to denote the number of Kohn-Sham 
conduction states that is optimised to represent. 

The optimisation of both P^ c ^ and {xa} scales linearly 
with system size. As in the ground-state calculation, the 
individual Kohn-Sham eigenstates can be calculated from 
a single diagonalisation of the Hamiltonian in conduction 
NGWF representation if needed. The obtained conduc- 
tion states are shown to be in excellent agreement with 
traditional plane- wave DFT implementations 30}. Thus 
the NGWF approach allows the representation of both 
the occupied space and a low energy subset of the un- 
occupied space to plane-wave accuracy using two inde- 
pendently optimised sets of localised functions. The un- 
derlying psinc basis allows for a systematic improvement 
of the NGWFs and the individual optimisations ensure 
that only a minimal set of {4> a } and {xp} have to be 
used in order to represent the valence and conduction 
space. In contrast to methods making use of a single set 
of localised orbitals, the double NGWF approach also 
allows for keeping a strict localisation on {4> a } repre- 
senting the valence space, while for {xp} a larger local- 
isation radius can be chosen. These features make the 
conduction and valence NGWFs ideal for the applica- 
tion to the linear response TDDFT formalism, provided 
only low energy excitations are of interest. The main 
limitation of the NGWF representation is the inability 
to express high energy delocalised, unbound conduction 
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states. Thus only excitations that contain no significant 
fractions of transitions into unbound states are expected 
to be representable using the NGWFs. 



B. The linear response TDDFT formalism 

In recent years, a number of reviews on different as- 
pects of TDDFT have been published d-G^- In general, 
one differentiates between two main formalisms: The lin- 
ear response formalism, which can be cast into an effec- 
tive eigenvalue equation and the time propagation for- 
malism, in which the time-dependent Kohn-Sham equa- 
tions are propagated explicitly. Linear response TDDFT 
has become the method of choice for calculating low 
energy excitations and spectra and is now widely used 
9, 10]. In the linear response regime, the excitation en- 
ergies can be expressed as the solution to the eigenvalue 
equation |9J 



A B 



1 

-1 



(3) 



where the elements of the block matrices A and B can 
be expressed in canonical Kohn-Sham representation as 



&c,c' &v ,v' 



KS 



,KS\ 



(4) 
(5) 



Here, c and v denote Kohn-Sham conduction and valence 
states and K is the coupling matrix with elements given 
by 



Kr 



d 3 rdV 



1 



5p(r)Sp(r') 



,{0} 



x^(r)^(r)^(r')^(r')- (6) 



In the above expressions, we have omitted all spin in- 
dices for convenience. Furthermore, the coupling matrix 
is taken to be static, a simplification that is known as the 
adiabatic approximation. E xc is the exchange-correlation 
energy and its second derivative, evaluated at the ground- 
state density p^ of the system, is known as the TDDFT 
exchange-correlation kernel. As in ground state DFT, its 
exact functional form is not known. A commonly made 
choice is to use i? xc = £™, which is known as the adi- 
abatic local density approximation (ALDA). 

A further simplification to the TDDFT eigenvalue 
equation can be achieved by making use of the Tamm- 
Dancoff approximation (TDA) |3lj |. In this approxima- 
tion, we assume the off-diagonal coupling matrix ele- 
ments B cv y v i to be small. The matrix equation then 
simply reduces to 



AX = wX 



(7) 



a matrix eigenvalue problem of half the size of the origi- 
nal one. More crucially, the TDDFT eigenvalue equation 



in the TDA is Hermitian, while the original equation is 
not [lfj. Generally speaking, the TDA gives good exci- 
tation energies but violates oscillator strength sum rules 
@. However, due to its Hermitian properties, the TDA 
lends itself to solutions involving standard matrix eigen- 
value solvers and will therefore be considered for the rest 
of this work. 

In principle, the matrix A can be built explicitly and 
Eq. [7]can be diagonalised to give all excitation energies of 
the system. Clearly, this is not possible with linear scal- 
ing effort, as the dimensions of A grow as 0(N 2 ) with 
system size and the matrix is not sparse in the canoni- 
cal representation. Since every matrix element involves 
a double integral over product Kohn-Sham states, con- 
structing A scales as 0(N 6 ) [32j . However, in the limit 
of large systems when one is only interested in a com- 
paratively small number of eigenvalues, it is much more 
advantageous to use iterative methods instead of direct 
diagonalisation to calculate the eigenvalues of A. In or- 
der to do so one needs to define the action of A on an 
arbitrary trial vector x. Following the formalism intro- 
duced by Hutter [2(| we define 



0) =^2^c( r ) x cv^( r ) 



(8) 



where p^(r) is the first order response density associ- 
ated with the trial vector x. Defining the self-consistent 
field potential VgQp(r) as a reaction to the response den- 
sity as 



V {1} (r) 



dV 



|r - r'| 

S 2 E V 



5p(r)5p(r>) 



p«(r') (9) 



,{0} 



the action q of the TDDFT operator A on the arbitrary 
trial vector x can be simply written as 



Here, (V<£>) is given by 



-4, 



c'v' 



e KS x 



CV . C V' lV C V' 



x e KS + ( y{l}^ 



SCf)cu 



d 3 r^(r)V^>(r)^(r) 



(10) 



(11) 



One can then express the lowest excitation energy w n 
of a system in terms of q cv 



(12) 



which can be minimised variationally with respect to x. 

Although the approach above is outlined in the canon- 
ical representation, it can be reformulated in terms of 
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local orbitals or other basis functions. In many quan- 
tum chemistry codes, Vgcp is constructed in a Gaus- 
sian basis set representation, making use of highly opti- 
mised methods to perform two centre Gaussian integrals 
[2H HH . Plane wave implementations typically make use 
of a mixed representation of canonical orbitals for the 
occupied states and plane waves for the virtual states 
[H, ■ The main advantage of all these iterative meth- 
ods is that no explicit construction and storage of A 
is required, which is prohibitive for large system sizes. 
However, the different basis set implementations men- 
tioned above still make reference to individual Kohn- 
Sham states, thus calculating q still shows an asymptotic 
scaling of 0(N 3 ) with system size (32j. To improve the 
scaling, one has to avoid any reference to the canonical 
representation}^ . 

C. Linear-scaling linear response TDDFT 

ONETEP provides a set of optimised NGWFs {xa} span- 
ning the low energy conduction space and {4>p} spanning 
the valence space. Together, they form a suitable repre- 
sentation to expand quantities like p^ and Vg^p. In the 
following section, for all expressions including the sets 
of localised NGWFs, we will differentiate between co- 
variant and contravariant quantities by using lower and 
upper case greek indices respectively. For quantities in- 
volving the canonical Kohn-Sham states, this differenti- 
ation is unneccessary since the Kohn-Sham orbitals form 
an orthogonal basis. For a more in depth treatment of 
tensors in electronic structure theory, see [H, [34| . The 
Kohn-Sham orbitals are used in this section to derive the 
appropriate expressions in NGWF representation, as well 
as to highlight the equivalence to the canonical represen- 
tation. Note however, that there is no explicit reference 
to the canonical representation in the final expressions. 

Starting with the response density, we can write 

c,v 

occ opt 

= EE^i^)^ Q i^ KS )^^ KS i^)^i r )- ( 13 ) 

V c 

Here, the sum of the conduction states goes over all the 
states for which {xa} was optimised. We have again 
assumed an implicit summation over repeated greek in- 
dices. In principle, one has to sum over an infinite num- 
ber of conduction states. However, for the lowest few 
optical energies in the system, p^ is well described by 
a relatively small number of unoccupied states. This ap- 
proximation can be rigorously tested by including a larger 
subset of the conduction space manifold in the optimisa- 
tion of the conduction density matrix P^. In the spirit 
of the linear scaling DFT formalism the above expression 
can be rewritten as 

p«(r)= XQ (r)P«<*%(r) (14) 



where the effective response density matrix pi 1 !"^ is de- 
fined as 

occ opt 

p{1}aP - EE^ c K V^ KS i/>. (15) 

V c 

The above definition is analogous to the definitions of 
the valence and conduction density matrices in NGWF 
representations, where 

opt 

(p {c} r p = E(x^ c KS }<^ c K V> (i6) 

c 

occ 

{P M)^ = Y / (r\€ s )(€ s \^)- (17) 

V 

Eq. [15] defines the full response density matrix in 
mixed conduction-valence NGWF representation. Each 
TDDFT excitation energy can be written as a functional 
of a specific response matrix and thus plays the 

same role in the linear-scaling linear response formula- 
tion as the eigenvector x does in the canonical formula- 
tion outlined in the previous section. 

Similarly to the response density, (Vqqp) cv can be 
rewritten as 

(V s { clU = (^ S \x a )(x a \V s { cl\M(^\^ S )- (18) 

Furthermore, the diagonal part of q cv consisting of Kohn- 
Sham conduction- valence eigenvalue differences becomes: 

opt 

e™x cv - x cv e™ = E(^ KS |x a }(x Q |^lx^}(x^l^ S >^, 

c' 

OCC 

- \n (K \H \<j> p ) |^ KS X19) 

v' 

It is now convenient to introduce a shorthand notation for 
the matrix elements of different quantities in terms of the 
different types of NGWFs. We denote the Kohn-Sham 
Hamiltonian in conduction and valence NGWF represen- 
tations as H x and respectively and the self consistent 
field response in mixed conduction-valence NGWF rep- 
resentation as Vg^^: 

(HX) a0 = ( Xa \H\xp) (20) 

(H^ap = (4a\H\<l>p) (21) 

= (XoI^scfI^)- (22) 
By inserting Eq. [TO] and Eq. [TH into Eq. [TO1 mul- 
tiplying with (x 01 ^^) and (V^ cs |</> /3 ) from the left and 
right respectively and summing over the c and v indices, 
one can remove all references to the canonical representa- 
tion from q. Using the definition of the response density 
matrix pl 1 -^, the result of the TDDFT operator acting 
on a trial response matrix P^ in NGWF representation 
reduces to the simple form 

+ (P {c} V^l x *P^) afl . (23) 



5 



Note that in the linear-scaling formalism employed in 
QNETEP, H x , H , P {c} , P M and are all sparse 

matrices for sufficiently large system sizes (35j |. Further- 
more, the response potential Vg^ F (r) is a functional of 
the response density only. Constructing from Eq. [H] 
only requires information from density matrix elements 
pi 1 }"/ 3 f or which (xq|<^s) and therefore scales lin- 
early with system size even for fully dense Evalu- 
ating Vgc F (r) from Eq. Oalso scales linearly for any semi- 
local exchange-correlation functional. Thus constructing 
Vggp ^ scales linearly with system size for fully dense 
P^. However, in evaluating the matrix operations in 
Eq. [23j linear scaling can only be achieved if the re- 
sponse density matrix is truncated, just like the density 
matrix in linear-scaling DFT. If this truncation can be 
performed, the response density matrix becomes sparse 
for sufficiently large systems and evaluating the action 
of the TDDFT operator on an arbitrary response matrix 
P^ 1 ^ scales linearly with system size. 

Using the action of the TDDFT operator in NGWF 
representation defined in equation 1231 one can then 
rewrite the lowest excitation energy of the system as 



Tr 



mm 
p{i} 



p{i}t s x q x0 S " 



Tr 



p{i}t s xp{i} s 4 



(24) 



Here, S x and denote the conduction and valence 
NGWF overlap matrices given by (S x ) a p = (x<x\xp) an d 
{S^) a /3 = (</>a|<^)- Using the definitions of the involved 
quantities, as well as the invariance of the trace operation 
under cyclic permutation, it is trivial to show that Eq. [M] 
is equivalent to Eq. [T2]in the canonical representation. 



D. The algorithm 

In order to calculate the N u lowest excita- 
tion energies of a system with response density 

matrices •fp| 1 ^;i = 1, ..JV W j and corresponding 
|q x< ^; i = 1, ...A^ j, we define the function 



£<* = E 



Tr 



>{i}t 



gX q X0g0 



Tr 



(25) 



which can be minimised with respect to -^I-*^ 1 ^ ^ under 
the constraint 



Tr 



>{i}t 



gxp{i}g<? 



= 5 t 



(26) 



Again using the expression for |pj^ j and the invari- 
ance of the trace under cyclic permutations, it is clear 
that the above constraint is equivalent to the require- 
ment that eigenvectors of the canonical TDDFT eigen- 
value problem (Eq. [7]) are orthonormal to each other. 



When n is minimised, i-P} 1 ^} span the same subspace 
as the lowest eigenvectors of the TDDFT operator 
A. In this work, the minimisation of Q is achieved us- 
ing a conjugate gradient algorithm with Gram-Schmidt 
orthonormalisation. 

Differentiating ft with respect to Pj 1 ^ one can find the 
(covariant) gradient orthogonal to all current (contravari- 
ant) trial response matrices {P^} [36] 



3 



Tr 



p{l} S X y X0 g 



{gtU = (s^U( q fr s (s%p 

' (5 x ) Q7 (p/ 1} )^(5*) 5/3 (27) 



Operating on the left and right with the inverse conduc- 
tion and valence overlap matrices, the covariant gradient 
can be transformed into a contravariant gradient 



3 



Tr 



[pj l} V<^s* 



{l}\a/3 



(PI 



(28) 



which can be used as a steepest descent search direction 
for a conjugate gradient algorithm. 

The exact form of the conjugate gradient algorithm 
used here has been outlined elsewhere [36[ (with the dif- 
ference that we do not make use of any prcconditioncr). 
Here we just focus on how to choose a suitable starting 
guess for {P, {1} }. Since we do not have individual Kohn- 
Sham states available in the linear scaling formalism of 

{i} 



the ground state calculation, we cannot initialise P, 
to conduction-valence product Kohn-Sham states close 
to the band gap, which would otherwise form reasonable 
starting guesses. Instead we initialise the set of {P|^} 
to random starting configurations (for other possible ini- 
tialisation schemes, see 122J). However, from Eq. [TBI it 
can be seen that any valid response density matrix must 
be invariant under the operation 



p{i}' _ p{ c }gxp{i}g0pM _ p{l} 



(29) 



This operation can be understood as a projection into 
conduction and valence Kohn-Sham states in their 
NGWF representation. Response density matrices that 
violate invariance under this projection contain elements 
that would correspond to forbidden transitions between 
two occupied or two unoccupied states, or contain con- 
tributions from unoptimised and thus badly represented 
high energy conduction states. The invariance require- 
ment follows from an expansion of the density ma- 
trix idempotency constraint to first order for a given 
perturbation]!?) and must thus be fulfilled for all first 
order response density matrices. The need to enforce the 
idempotency constraint explicitly via the projection of 
Eq. [5H]can be viewed as the price to be paid for moving 
away from a formulation involving the canonical repre- 
sentation. 

The invariance requirement can be enforced by project- 
ing the starting guess response matrices with P^S X and 
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g<£pM f rom the left and the right respectively. From Eq. 
US it can be seen that q x *, the result of the TDDFT op- 
erator acting on a valid trial response density matrix, au- 
tomatically shows the same invariance property as P^ 1 ^. 
Therefore all gradients {gj~} constructed using a valid set 

of {P|^} obey the invariance requirement by construc- 
tion. Thus, every conjugate gradient derived from {g^} 
will have the specified invariance property and updating 
a valid response matrix with a gradient will preserve the 
invariance of that matrix under the projection (Eq. I29[) . 

The orthogonality condition of Eq. [5H]is enforced using 
a Gram-Schmidt procedure, which has a nominal scaling 
of 0(N2 jv c ngwf JV„ ngwf ), with 7V C NGWF and N* GWF be- 
ing the number of conduction and valence NGWFs re- 
spectively. Both jV c NGWF and N^ GWF grow as O(N) 
with system size, giving an overall scaling of 0(N 2 ) with 
system size for the orthonormalisation procedure. How- 
ever, if pW is truncated and thus sparse, the scaling of 
the Gram-Schmidt orthonormalisation reduces to O(N), 
with a prefactor dependent on the square of the number 
of excitation energies N u . 

Thus, the whole algorithm outlined above scales lin- 
early in memory with the number of excitation energies 
to solve for. Since the N u individual resonse density 
matrices {Pj 1 *} have to be kept orthogonal to each other 
using a Gram-Schmidt procedure, the asymptotic scaling 
of computational cost with the number of excitation en- 
ergies is O(N^). However, for a fixed number of states 
required, the algorithm scales as O(N) with system size 
in both memory requirements and computational cost. 



E. Truncation of the response density matrix 

Since the algorithm developed in the previous sections 
only exhibits true linear-scaling properties if all involved 
density matrices P^, and can be truncated, 
one has to justify that the truncations are indeed possi- 
ble. The truncation of originates from the near- 
sightedness principle [23| and forms the basis of any 
linear-scaling DFT implementation. In insulating sys- 
tems, P^ can be shown to decay exponentially with 
distance [38|. For the conduction states, P^ is only ex- 
pected to exhibit an exponential decay if there is a second 
energy gap in the conduction band and P^ spans the 
manifold of conduction states between the two bandgaps. 
In this case, the same argument to show exponential de- 
cay of the ground-state density matrix can be applied to 
P^ [38| . Furthermore, by the same argument, the joint 
density matrix spanning the manifold defined by both 
p{"} and pw must be exponentially localised. The joint 
density matrix can be written as a block diagonal matrix 
with P'"' and p' c ' as its diagonal entries. Any response 
density matrix P^} due to the application of a small 
perturbation described in this work corresponds to the 
off-diagonal blocks of said joint density matrix. However, 
the application of a small perturbation cannot break the 



disentanglement of the joint manifold of P^ and 
from the rest of the conduction manifold and thus can- 
not break the exponential localisation of the joint block 
density matrix. The joint block density matrix can only 
be exponentially localised if all its constituent blocks are 
exponentially localised. We thus conclude that, in the 
special case described here, the TDDFT response den- 
sity matrix P^ 1 ^ is indeed expected to be exponentially 
localised. 

The desired property of exponential localisation of the 
conduction density matrix and thus of the response den- 
sity matrix can most likely be realised in ID systems 
and molecular crystals, where the bands show little dis- 
persion. However, it is evident from the above consid- 
erations that one cannot present a generalised argument 
that P^ 1 } can be truncated for all systems. This limi- 
tation is not unique to the linear response formulation 
of TDDFT presented here, but applies to linear-scaling 
time domain TDDFT as well, where the time-dependent 
response density matrix is truncated without a general 
formal justification. It was however noted by Yam et 
a/.[ll| and Chen et a/.[39j], that for a number of systems 
studied the first order response density matrix retained 
the localisation of the ground-state density matrix to a 
good degree and thus could be truncated. In general, 
we expect this finding to be true for the relatively lo- 
calised excited states of a variety of systems. Whether a 
truncation of P^ can be achieved for very delocalised 
high-energy excitations is doubtful. However, since the 
method presented here is mainly aimed at low energy ex- 
citations of large systems, we expect that the truncation 
of both P {c} and P {1} can indeed be carried out in prac- 
tice for a certain class of systems and a linear scaling of 
computation time with system size can be achieved. 

Truncation of pW adds an additional complication to 
the algorithm in that the invariance relation of Eq. [25] 
only holds approximately. Thus the gradient g 1 " derived 
from a truncated P^} only approximately preserves the 
invariance property and the accumulation of errors can 
lead to instabilities in the convergence. To measure the 
variations of P^ 1 ^ from valid response matrices obeying 
the projection operation of Eq. [511 we define the positive- 
semidefinite norm Q P^ ; 



>{i} 



= Tr 



{i}t s xp{i} s 



{i}'t s xp{i}' s ^ 



(30) 

For fully dense matrices P^ 1 ^ initialised in the way de- 



scribed in the previous section, Q 



P {i} 



vanishes to nu- 



merical accuracy. For truncated response density matri- 



ces, Q 



P {i} 



can be forced to remain smaller than some 
threshold by iteratively applying the projection of Eq. [25J 
to pW after each TDDFT iteration, thus stabilising the 
algorithm. 
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ONETEP 


Ref. data [44] 


exp.[43] 




1.73(0.053) 


1.91(0.041) 


2.12(0.08) 


2 1 B 3u {x) 


3.77(0.038) 


3.95(< 0.001) 




?B 2u {y,P) 


3.86(2.46) 


4.24(3.346) 


4.00(2.20) 


3 1 B 2u (x) 


4.00 (0.010) 


4.36(0.003) 





TABLE I: Excitation energies and oscillator strengths for se- 
lected low energy transitions in Pentacene. Energies are given 
in eV, oscillator strengths in brackets 



III. RESULTS AND DISCUSSION 

In this section, we will assess the performance of the 
method outlined above, as implemented in the ONETEP 
code. We have tested the implementation on a number 
of small and medium sized molecules, before perform- 
ing a full demonstration of the linear scaling capabilities 
of the method. Unless specified otherwise, all calcula- 
tions are carried out using the LDA exchange correla- 
tion functional for the ground-state DFT calculations 
and ALDA for the TDDFT calculations, both in the 
Perdew-Zunger paramcterisation[40j. Norm conserving 
pseudopotentials[4l| are used throughout this work. 



A. Pentacene 

As the first test system we chose pentacene (C22H14), 
a medium sized polycyclic aromatic hydrocarbon. The 
simulation box was chosen to be 40 x 49 x 30 Oq and the 
kinetic energy cutoff was 750 eV. The atomic positions 
were optimised at the LDA level 42j . A minimal set of 1 
NGWF per H and 4 NGWFs per C atom was chosen for 
both the occupied and the unoccupied state representa- 
tions. The NGWF radius for both valence NGWF species 
was chosen to be 8.0 arj while 14.0 arj was chosen for the 
conduction NGWFs. The conduction density kernel was 
optimised for the 10 lowest unoccupied states, covering 
all of the bound unoccupied states. This put the dimen- 
sions of the TDDFT operator at 510 x 510 in a canonical 
representation and 10404 x 10404 in a representation of 
conduction and valence NGWFs. 

Table Q] shows selected low energy excitations as cal- 
culated with the method presented here in comparison 
to experiment [i^] and results obtained by Malloci et al. 

The reference results were obtained using 6-31+G* 
Gaussian basis set and the B3LYP exchange-correlation 
functional [HEl. We find that ONETEP ALDA con- 
sistently underestimates the experimental excitation en- 
ergies, and that the method in [44] using the B3LYP 
functional gives systematically higher excitation energies 
compared to the ALDA results. This shift in energy is 
expected to be largely due to the partial correction for 
electronic self-interaction in the B3LYP functional. How- 
ever, we also note that the Gaussian basis set used in [44| 
is optimised to describe the ground state energy, and thus 



likely to be of worse quality than the specifically opti- 
mised NGWFs used in ONETEP. 

The main purpose of choosing a small test system like 
pentacene is to systematically evaluate the convergence 
of the obtained excitation energies with respect to the 
two main parameters of the algorithm introduced here. 
The first is the limitation of to a small, low energy 
part of the conduction space, while the second is the lo- 
calisation of the conduction NGWFs. The convergence 
of the 2 1 B2u(y,(3) and S 1 B2 U (x) states with respect to 
the two parameters can be found in Fig. [1] We note that 
for the low energy excitations considered here, the re- 
striction of the unoccupied subspace to contain only well 
bound states is indeed a valid approximation. Including 
10 conduction states explicitly into is sufficient to 
converge the excitation energies to less than 5 meV. As 
expected, a generally larger NGWF radius is needed for 
the conduction NGWFs than for the valence NGWFs. 
However, we find that the same level of convergence is 
reached for a radius of 14 qq. 



B. Buckminsterfullerene 

As a second test system, we pick buckminsterfullerene 
(Ceo) which has already been studied extensively both 
experimentally and using ab initio simulation techniques. 
Again, we are only interested in the low energy part 
of the spectrum corresponding to visible and ultravio- 
let light. Calculations were performed in a simulation 
cell of 37.8 x 37.8 x 37.8 Oq, using a kinetic energy cutoff 
of 800 eV. A minimal number of 4 NGWFs was chosen 
for both conduction and valence representations, while 
the NGWF radius was chosen to be 13.0 clq and 8.0 a re- 
spectively. A total of 30 conduction states were explicitly 
included into the calculation. 

Ceo shows a high number of dark transitions in the low 
energy range, transitions for which the oscillator strength 
is very small. Thus to reproduce the main features of the 
spectrum up to an energy of 4.8 eV, 150 excitations had 
to be converged. The converged spectrum is shown in 
Fig. [2] The most prominent features of the spectrum are 
the strong excitation peaks at 3.46 eV and 4.42 eV, which 
are in good agreement to the TDDFT energies and oscil- 
lator strengths obtained in 48j using a gradient-corrected 
functional and a 6-31G+S Gaussian basis set. Further- 
more, the energies we obtained are in perfect agreement 
with the 3.5 eV and 4.4 eV obtained in time-propagation 
TDDFT calculations using a basis of linear combinations 
of atomic orbitals by Tsolakidis et al [13] • Experimen- 
tally, the peaks are reported to be at 3.78eV and 4.84 eV 
[4^ . in reasonable agreement with the TDDFT results. 

The calculations for Cqq also provide an opportunity 
to show the scaling of computational cost of the TDDFT 
calculation with the number of converged excitation ener- 
gies N u . Figure [3] shows the total calculation time versus 
the number of converged excitation energies as well as 
the total time taken in applying the TDDFT operator 



8 




8 10 12 

Number of conduction states 
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16 



FIG. 1: Energy convergence of the 2 1 B2 U {y, P) and 3 1 B2 U (x) states with respect to the number of conduction states explicitly 
included in pf c } and the localisation radius of the conduction NGWFs. 



ONETEP TDDFT 
Reference TDDFT 




Energy (eV) 



FIG. 2: Absorbtion spectrum of C60 generated from the 150 
lowest excitation energies. An artificial smearing width of 0.03 
eV was used in generating this plot. The TDDFT excitation 
energies and oscillator strengths of three major excitations 
were taken from [48j]. The two spectra were scaled according 
to their relative oscillator strengths. 




Number of excitation energies 

FIG. 3: Computation time vs. number of excitiation energies 
converged for Ceo- The red line is a parabolic fit to the total 
calculation time while the blue line is a linear fit to the total 
time taken to apply the TDDFT operator on the set of trial 
vectors. The non-linear behaviour of the total calculation 
time due to the orthogonalisation of multiple excitations is 
clearly visible. 



on the trial vector (Eq. [23]) . The cost of applying the 
TDDFT operator scales linearly with the number of ex- 
citation energies, as one would expect. However, it can 
be seen that for larger numbers of excitations, the 0(N%) 
scaling of the Gram-Schmidt orthonormalisation begins 
to dominate over the application of the TDDFT opera- 
tor and the total calculation time deviates from the linear 
trend. 



C. Chlorophyll 

In many ways, chlorophyll a (Cssh^Mg^O^) pro- 
vides an ideal application for the method outlined in this 
work. Although it is too small to fully exploit all ad- 
vantages of linear scaling with system size in both the 
DFT and TDDFT calculation, its size represents the up- 
per limit of systems that can be comfortably studied us- 
ing plane wave TDDFT implementations [12| . Due to its 
importance in photosynthesis, chlorophyll has been stud- 
ied in great detail both experimentally and in theoretical 
work using TDDFT. 

Calculations on chlorophyll were performed using a ki- 
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Energy (eV) 



FIG. 4: Absorbtion spectrum of chlorophyll a generated from 
the 12 lowest excitation energies compared with the experi- 
mental spectrum of chlorophyll a in diethyl ether [49^ . An 
artificial smearing width of 0.02 eV was used in producing 
the TDDFT results. The TDDFT intensities were scaled by 
a constant factor in order to compare with experiment. 



netic energy cutoff of 800 eV. A minimal number of 4 
NGWFs per N, H, C, O and Mg atom and 1 NGWF per 
H atom was chosen for the set of valence NGWFs, while 
for the conduction NGWFs, 13 and 5 where chosen per 
atom respectively. For the valence NGWFs, a radius of 
8.0 ao was chosen throughout, while for the conduction 
NGWFs, a radius of 12.0 an was chosen. The 15 lowest 
unoccupied states were explicitly optimised and included 
in the calculation. The resulting spectrum produced by 
the 12 lowest excitation energies in comparison to the ex- 
perimental spectrum of chlorophyll in diethyl ether [49| 
can be found in Fig. |H We note that the peak at 1.99 
eV appears blue-shifted, while the peak at 2.75 eV is 
red-shifted compared to the experimental data. The po- 
sitions of the two main excitation peaks are however in 
very good agreement with the 2.00 eV and 2.75 eV ob- 
tained by Sundholm in a TDDFT calculation using 
Gaussian basis sets and an ALDA functional. Further- 
more, the overall shape of the spectrum produced here 
appears to be in very good agreement with TDDFT cal- 
culations obtained by Rocca et al. [12J using the PBE 
exchange correlation functional and a plane-wave basis 
set. 

The main point that can be taken from the TDDFT 
calculation presented here is that almost the whole visi- 
ble spectrum of chlorophyll a, from 1.8 to 3.0 eV, can be 
generated by just calculating the first 12 excited states of 
the TDDFT superoperator. Since the number of states 
required is very small compared to the dimensions of the 
TDDFT operator, iterative methods based on linear re- 
sponse theory are much more efficient than calculations 
based on the time propagation of the time dependent 
Kohn-Sham equations. Thus, systems like chlorophyll a, 
where the low energy spectrum is completely dominated 
by a few very strong excitations and there is only a very 



small number of dark, dipole forbidden states, provide a 
perfect application for the method discussed in this work. 



D. GaAs nanorods 

The accuracy of the method with truncated density 
matrices is tested on a GaAs nanorod. A number of 
these nanorods with different terminations have already 
been studied in some detail (EH [HJ]. For our purposes 
here, we choose a nanorod with Hydrogen termination, 
consisting of a total of 996 atoms and having a length 
of 159 an. The calculations were performed at a kinetic 
energy cutoff of 400 eV and a minimal number of 4 NG- 
WFs per Ga and As atom and 1 NGWF per hydrogen 
atom was chosen for both sets of NGWFs. An NGWF 
localisation radius of 12 ao was chosen for all NGWFs. 
Since the purpose of the calculations on the nanocrystal 
was to establish the magnitude of errors introduced by 
the response density matrix only, we performed all calcu- 
lations with fully dense conduction and valence density 
matrices and only truncated pW to different degrees. 

The nanorods studied here exhibit a large dipole mo- 
ment and thus a strong electrostatic potential along their 
long axes, causing the HOMO and LUMO to be strongly 
localised to opposite ends of the rod. Thus for any semi- 
local approximation to the exchange-correlation kernel, 
one would expect the lowest excitation energy of the 
system to correspond to a charge transfer state across 
the rod. When calculating the lowest eigenvalue for the 
system using a fully dense response density matrix, this 
charge transfer state is exactly what we obtain. However, 
once a density matrix cutoff is introduced, the TDDFT 
algorithm converges to an excited state fully localised on 
the As terminated end of the rod and considerably higher 
in energy (see Fig. [5]). 

In Fig. |6l the energy convergence of the localised ex- 
cited state is plotted with respect to the density matrix 
truncation used. We find that although a density ma- 
trix cutoff does not allow us to converge charge-transfer 
type excitations, the more localised excitation on the As 
terminated end of the rod is determined to a high de- 
gree of accuracy. A density matrix truncation radius of 
40 ao introduces an error of less than 5 meV compared 
to the excitation calculated with the full density matrix, 
suggesting that calculating localised excitations with a 
truncated density matrix is indeed possible. 

We note that the charge-transfer excitations that can- 
not be recovered in our method using a response density 
matrix truncation are precisely the type of excitations 
that are significantly underestimated in TDDFT using 
local exchange-correlation kernels [111]. Thus excluding 
these states from a calculation might indeed be desired 
in certain systems, especially since they correspond to 
dark transitions and do not show up in the spectrum. We 
have shown that excluding these states can be achieved 
naturally in the linear-response TDDFT formulation pre- 
sented here by applying a suitable truncation on the re- 
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FIG. 5: The transition density of the lowest excitation of a GaAs nanorod as found for a truncated density matrix at 75 an 
(upper figure) and the full density matrix (lower figure). The excited state corresponding to the truncated response density 
matrix is 0.33 eV higher in energy than the one corresponding to the full density matrix. In this plot, H is shown in grey, As 
in yellow and Ga in purple. 
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FIG. 6: Lowest excitation energy of a GaAs nanorod as con- 
verged with different response density matrix trunctations. 



sponse density matrix. 



E. (10,0) Carbon nanotubes 

To demonstrate the linear scaling of the method with 
the number of atoms, a test system of a single-walled 
(10,0) carbon nanotubes (CNTs) in periodic boundary 
conditions is chosen. Supercell sizes of 640, 920, 1240, 
1600 and 1920 atoms are chosen, corresponding to seg- 
ments of 127, 193, 257, 321 and 386 an in length. Due to 
the periodic boundary conditions in place, all supercells 
simulate an infinitely long (10,0) CNT. 

There are well-known problems associated with us- 
ing local exchange-correlation kernels in infinite systems, 



which are widely discussed in the community 10]. Fur- 
thermore, the very delocalised nature of excitations in 
the infinite system means that the CNT is not an ideal 
candidate for introducing a cutoff on the response den- 
sity matrix, as seen in the previous section. The cal- 
culation performed here should therefore be regarded as 
a demonstration of linear-scaling capabilities only, while 
the previous sections provide a general demonstration for 
the accuracy of the method. 

The calculations were performed at a kinetic energy 
cutoff of 700 eV and only the lowest excitation energy 
was converged. As in previous sections, a minimal repre- 
sentation of 4 NGWFs per C atom was used for both the 
conduction and the valence NGWF sets. A localisation 
radius of 8.0 ao and 12.0 ao was selected for the valence 
and conduction NGWFs respectively. The number of un- 
occupied states included explicitly in the calculation was 
chosen such that all bound states were included and thus 
was scaled up linearly as the supercell size was increased. 
For the largest segment of 1920 atoms, this corresponds 
to a TDDFT operator of dimension 1.84 x 10 6 in canon- 
ical representation and 5.90 x 10 7 in conduction-valence 
NGWF representation, prohibitively large for any non- 
iterative treatment of the eigenvalue problem. In order 
to achieve full linear scaling in both the ground state 
and the TDDFT calculation, a cutoff radius of 35 ao was 
applied to both the valence and the conduction density 
matrix. 

The calculation time for a single iteration of the 
TDDFT conjugate gradient algorithm with respect to the 
different supercell sizes of (10,0) CNTs can be found in 
Fig. [7l Calculations have been performed for both a 
fully dense response matrix and a response matrix that 
has been truncated at 60 ao. It can be seen that with 
a moderate response matrix truncation of 60 ao, the cal- 
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Truncation: 60 Bohr 




1000 1500 
Number of atoms 

FIG. 7: Computation time in seconds for a single TDDFT 
iteration step vs. number of atoms for different supercell sizes 
of (10,0) CNTs. The calculations were performed on 72 cores. 
The red line is a cubic fit to the calculation time for a full 
response density matrix, while the blue line is a linear fit to 
the calculation time for a density matrix truncated at 60 ao. 



culation time of a TDDFT iteration scales fully linearly 
with system size. However from Fig. [7] it is also evident 
that even for fully dense response matrices, the algorithm 
exhibits a near linear scaling behaviour up to the largest 
supercell sizes. Thus for system sizes tested here, the con- 
struction of the response potential matrix V"^ x ^, which 
only depends on the density and thus scales linearly even 
for fully dense P^ 1 ^, dominates the computation time of 
the TDDFT algorithm. For even larger system sizes, it 
is expected that the cubic scaling associated with the 
fully dense matrix operations performed to construct the 
TDDFT gradient and conjugate search directions will 
start to strongly influence computation times, making a 
truncation of necessary. However, it is evident that 
the algorithm presented here exhibits an excellent scal- 
ing up to large system sizes (1920 atoms) even without 
enforcing the truncation of the response density matrix. 



IV. CONCLUSIONS 

We have presented a linear-scaling TDDFT algorithm 
in the linear response formalism. We have demonstrated 
the accuracy of the method on a number of test sys- 
tems by comparing to results in the literature obtained 
with conventional methods. The method presented in 
this work is ideal for systems in which the low energy ex- 
citation spectrum is dominated by a few very strong tran- 
sitions and only a relatively small number of dark states. 



For these systems, the advantages of an iterative treat- 
ment of the eigenvalue problem can be fully exploited 
and the method is expected to outperform standard time- 
evolution TDDFT algorithms. For systems with a very 
large number of dipole forbidden states, or nanocrystals 
with an indirect band gap, calculations become more de- 
manding since a much larger number of states need to be 
converged in order to produce a meaningful spectrum. 
However, while the orthogonality requirement of differ- 
ent excited states means that the algorithm cannot scale 
linearly but rather quadratically with the number of ex- 
citation energies converged, we note that the prefactor in 
the quadratic term is generally small, as demonstrated in 
the calculations on buckminsterfullerene. 

Furthermore, we have demonstrated that the method 
scales truly linearly with system size if all density matri- 
ces in the formalism can be treated as fully sparse. We 
have shown the validity of truncating the response den- 
sity matrix on GaAs nanorods for localised excitations, 
thus giving an example of a realistic system that can be 
studied while making full use of the advantages of the 
linear-scaling algorithm presented. While we find that 
the truncation of the response density matrix prevents 
us from calculating long-range charge transfer states, we 
note that these states are badly represented in local ap- 
proximations to the TDDFT exchange-correlation ker- 
nel in any case. A response density matrix truncation 
can thus provide an effective way of excluding unwanted 
charge transfer type states from the calculation. While 
we have shown that truncations of the response matrix 
are not always possible for excitations of arbitrary sys- 
tems, we note that the algorithm shows excellent scaling 
even for fully dense response density matrices up to a sys- 
tem size of over 2000 atoms. Thus, we expect the method 
to enable large scale computations of optical excitations 
in important areas such as biophysics and nanoscience. 
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